Evaluating Subspace Clustering Algorithms
نویسندگان
چکیده
Clustering techniques often define the similarity between instances using distance measures over the various dimensions of the data [12, 14]. Subspace clustering is an extension of traditional clustering that seeks to find clusters in different subspaces within a dataset. Traditional clustering algorithms consider all of the dimensions of an input dataset in an attempt to learn as much as possible about each instance described. In high dimensional data, however, many of the dimensions are often irrelevant. These irrelevant dimensions confuse clustering algorithms by hiding clusters in noisy data. In very high dimensions it is common for all of the instances in a dataset to be nearly equidistant from each other, completely masking the clusters. Subspace clustering algorithms localize the search for relevant dimensions allowing them to find clusters that exist in multiple, possibly overlapping subspaces. This paper presents a survey of the various subspace clustering algorithms. We then compare the two main approaches to subspace clustering using empirical scalability and accuracy tests.
منابع مشابه
Subspace clustering using a symmetric low-rank representation
In this paper, we propose a low-rank representation with symmetric constraint (LRRSC) method for robust subspace clustering. Given a collection of data points approximately drawn from multiple subspaces, the proposed technique can simultaneously recover the dimension and members of each subspace. LRRSC extends the original low-rank representation algorithm by integrating a symmetric constraint ...
متن کاملHierarchical Subspace Clustering
It is well-known that traditional clustering methods considering all dimensions of the feature space usually fail in terms of efficiency and effectivity when applied to high-dimensional data. This poor behavior is based on the fact that clusters may not be found in the high-dimensional feature space, although clusters exist in subspaces of the feature space. To overcome these limitations of tra...
متن کاملSubspace MOA: Subspace Stream Clustering Evaluation Using the MOA Framework
Most available static data are becoming more and more highdimensional. Therefore, subspace clustering, which aims at finding clusters not only within the full dimension but also within subgroups of dimensions, has gained a significant importance. Recently, OpenSubspace framework was proposed to evaluate and explorate subspace clustering algorithms in WEKA with a rich body of most state of the a...
متن کاملA Robust k-Means Type Algorithm for Soft Subspace Clustering and Its Application to Text Clustering
Soft subspace clustering are effective clustering techniques for high dimensional datasets. Although several soft subspace clustering algorithms have been developed in recently years, its robustness should be further improved. In this work, a novel soft subspace clustering algorithm RSSKM are proposed. It is based on the incorporation of the alternative distance metric into the framework of kme...
متن کاملComparison of Subspace Projection Method with Traditional Clustering Algorithms for Clustering Electricity Consumption Data
There are many studies about using traditional clustering algorithms like K-means, SOM and Two-Step algorithms to cluster electricity consumption data for definition of representative consumption patterns or for further classification and prediction work. However, these approaches are lack of scalability with high dimensions. Nevertheless, they are widely used, because algorithms for clustering...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004